## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
In this exercise we will apply the classic Kalman Filter (KF) algorithm to the Google Flu Trends data we previously used to explore the state-space model. Unlike the previous exercise that fit a single time-series, we’ll utilize the matrix version of the Kalman filter to look at the flu across New England.
In the multivariate version of the KF, the connection between state variables in the Analysis step is provided in two ways: (1) through interactions in the process model itself, \(MPM^T\), and (2) through the covariance in the process error, \(Q\). In this assignment we’ll assimilate all 4 combinations of with/without interactions in the process model versus with/without correlation in the process error to evaluate how each impacts the inferences made. Since the KF will always be following the data, where we’ll see the largest impact of these choices will be in the differences in the state uncertainties, especially in the periods of missing data.
To begin, let’s load and plot the flu data for New England.
## load the Google flu data & select states
gflu = read.csv("http://www.google.org/flutrends/about/data/flu/us/data.txt",skip=11)
time = as.Date(gflu$Date)
states = c("Massachusetts","Connecticut","Rhode.Island","New.Hampshire","Vermont","Maine")
nstates = length(states)
y = t(gflu[,states])
## plot time-series from states
plot(time,1:length(time),type='n',ylab="Flu Index",lwd=2,log='y',ylim=range(y,na.rm=TRUE))
for(i in 1:nstates){
lines(time,y[i,],col=i,lwd=2)
}
legend("topleft",legend=states,lwd=2,col=1:nstates)
Kalman does not estimate parameters, so we will used parameters that were previously estimated by fitting a state space model to the data. In a real-world situation you wouldn’t fit two models to the same data (double dipping!), but rather you could fit a state-space model to the previous data and then use an operational forecast moving forward. Alternatively, you might augment the state matrix in the KF to include both the model states and the model parameters. However, for the classic KF, this approach is limited to only being able to estimate parameters that can be written as linear models of the augmented state + variable matrix M. Therfore, you are limited to estimating variables in the process model, f(X), not the parameters in the Observation Error or Process Error matrices. For the Kalman Filter exercise today we will be using estimates of these variance parameters, not the states, to inform the KF. Keep in mind that the KF is now treating these as KNOWN and thus ignoring parameter uncertainty.
In our previous model we assumed a Random Walk which we just fit Massachussetts. For this version we’ll keep working with a Random Walk but we’ll need to add a spatial contagious process to the random-walk process model. In other words, we’re positing that part of the reason that we see such strong correlations across-states is that infected individuals are able to move across state boundaries and infect individuals in adjacent states. To run such a model we’ll need to define a matrix that defines the adjacency between states, where 1 = adjacent, 0 = not adjacent, and the states are in the order: Massachusetts, Connecticut, Rhode.Island, New.Hampshire, Vermont, Maine.
## define adjacency between states slected
adj = matrix(c(0,1,1,1,1,0, ### state-to-state spatial adjacency (self=0)
1,0,1,0,0,0,
1,1,0,0,0,0,
1,0,0,0,1,1,
1,0,0,1,0,0,
0,0,0,1,0,0),nstates,nstates,byrow=TRUE)
To be more specific, lets assume a simple flux process just based on adjacency, and ignore differences in how population size, border length, transporation corridors, etc. affect the movement of individuals among the New England states.
\(X_{i,t+1} = X_{i,t} + \alpha*\sum(adj_{i,j}*(X_{j,t}-X_{i,t}))+\epsilon_{i,t}\)
Thus, if state j has more cases than state i, this will tend to increase infection in state i. For your reference, below is the JAGS model fit to the log-transformed flu data
SpatialRandomWalk = "
model{
#### Data Model
for(t in 1:n){
for(i in 1:nstate){
y[i,t] ~ dnorm(x[i,t],tau_obs)
}
}
#### Process Model
for(t in 2:n){
for(i in 1:nstate){
mu[i,t] <- x[i,t-1] + alpha * sum(adj[i,1:nstate]*x[1:nstate,t-1])
}
x[1:nstate,t] ~ dmnorm(mu[1:nstate,t],Omega_proc)
}
#### Priors
for(i in 1:nstate){
x[i,1] ~ dnorm(x_ic,tau_ic)
}
tau_obs ~ dgamma(a_obs,r_obs)
Omega_proc ~ dwish(R,k)
alpha ~ dbeta(1,20)
}
"
And the parameters estimated from the model
## load parameters (assume known)
load("data/KFalpha.params.Rdata")
## observation error
tau_obs
## tau_obs
## 0.005094734
## process error covariance
knitr::kable(tau_proc,col.names = states)
| Massachusetts | Connecticut | Rhode.Island | New.Hampshire | Vermont | Maine |
|---|---|---|---|---|---|
| 0.0130656 | 0.0083375 | 0.0064081 | 0.0071263 | 0.0046522 | 0.0059065 |
| 0.0083375 | 0.0136074 | 0.0065342 | 0.0073813 | 0.0047711 | 0.0060086 |
| 0.0064081 | 0.0065342 | 0.0124185 | 0.0081431 | 0.0054649 | 0.0068645 |
| 0.0071263 | 0.0073813 | 0.0081431 | 0.0154499 | 0.0060621 | 0.0076979 |
| 0.0046522 | 0.0047711 | 0.0054649 | 0.0060621 | 0.0107488 | 0.0052666 |
| 0.0059065 | 0.0060086 | 0.0068645 | 0.0076979 | 0.0052666 | 0.0118565 |
## process error correlation
knitr::kable(cov2cor(tau_proc),col.names = states)
| Massachusetts | Connecticut | Rhode.Island | New.Hampshire | Vermont | Maine |
|---|---|---|---|---|---|
| 1.0000000 | 0.6252902 | 0.5030751 | 0.5015754 | 0.3925707 | 0.4745606 |
| 0.6252902 | 1.0000000 | 0.5026571 | 0.5090742 | 0.3945061 | 0.4730532 |
| 0.5030751 | 0.5026571 | 1.0000000 | 0.5878879 | 0.4730061 | 0.5657167 |
| 0.5015754 | 0.5090742 | 0.5878879 | 1.0000000 | 0.4704107 | 0.5687673 |
| 0.3925707 | 0.3945061 | 0.4730061 | 0.4704107 | 1.0000000 | 0.4665257 |
| 0.4745606 | 0.4730532 | 0.5657167 | 0.5687673 | 0.4665257 | 1.0000000 |
## process error SD
sqrt(diag(tau_proc))
## [1] 0.1143047 0.1166508 0.1114383 0.1242976 0.1036765 0.1088874
Now that we have estimates for our parameters, let’s write functions that evaluates the classic Kalman Filter. Note, if you were running the KF in ‘operational’ mode, where new data is arriving in real time, you wouldn’t run the function all at once but rather just call the KalmanAnalysis every time new data is observed, followed by KalmanForecast to make a new forecast.
##' Kalman Filter
##' @param M = model matrix
##' @param mu0 = initial condition mean vector
##' @param P0 = initial condition covariance matrix
##' @param Q = process error covariance matrix
##' @param R = observation error covariance matrix
##' @param Y = observation matrix (with missing values as NAs), time as col's
##'
##' @return list
##' mu.f, mu.a = state mean vector for (a)nalysis and (f)orecast steps
##' P.f, P.a = state covariance matrix for a and f
KalmanFilter <- function(M,mu0,P0,Q,R,Y){
## storage
nstates = nrow(Y)
nt = ncol(Y)
mu.f = matrix(NA,nstates,nt+1) ## forecast mean for time t
mu.a = matrix(NA,nstates,nt) ## analysis mean for time t
P.f = array(NA,c(nstates,nstates,nt+1)) ## forecast variance for time t
P.a = array(NA,c(nstates,nstates,nt)) ## analysis variance for time t
## initialization
mu.f[,1] = mu0
P.f[,,1] = P0
I = diag(1,nstates)
## run updates sequentially for each observation.
for(t in 1:nt){
## Analysis step: combine previous forecast with observed data
KA <- KalmanAnalysis(mu.f[,t],P.f[,,t],Y[,t],R,I)
mu.a[,t] <- KA$mu.a
P.a[,,t] <- KA$P.a
## Forecast step: predict to next step from current
KF <- KalmanForecast(mu.a[,t],P.a[,,t],M,Q)
mu.f[,t+1] <- KF$mu.f
P.f[,,t+1] <- KF$P.f
}
return(list(mu.f=mu.f,mu.a=mu.a,P.f=P.f,P.a=P.a))
}
##' Kalman Filter: Analysis step
##' @param mu.f = Forecast mean (vector)
##' @param P.f = Forecast covariance (matrix)
##' @param Y = observations, with missing values as NAs) (vector)
##' @param R = observation error covariance (matrix)
##' @param H = observation matrix (maps observations to states)
KalmanAnalysis <- function(mu.f,P.f,Y,R,H){
obs = !is.na(Y) ## which Y's were observed?
if(any(obs)){
H <- H[obs,] ## observation matrix
K <- P.f %*% t(H) %*% solve(H%*%P.f%*%t(H) + R[obs,obs]) ## Kalman gain
mu.a <- mu.f + K%*%(Y[obs] - H %*% mu.f) ## update mean
P.a <- (1-K %*% H)*P.f ## update covariance
} else {
##if there's no data, the posterior is the prior
mu.a = mu.f
P.a = P.f
}
return(list(mu.a=mu.a,P.a=P.a))
}
##' Kalman Filter: Forecast Step
##' @param mu.a = analysis posterior mean (vector)
##' @param P.a = analysis posterior covariance (matrix)
##' @param M = model (matrix)
##' @param Q = process error covariance (matrix)
KalmanForecast <- function(mu.a,P.a,M,Q){
mu.f = M%*%mu.a
P.f = Q + M*P.a*t(M)
return(list(mu.f=mu.f,P.f=P.f))
}
With the Kalman Filter function defined, we need to define the inputs to the function. Note below that I’m using the variable KF00 to store the outputs, where I’m using 00 to indicate that this run was done with the defaults for both the process model and process error covariance. In the assignment below you will rerun this analysis under a number of alternatives varying the process error and the magnitude of spatial flux in the process model.
## log transform data
Y = log10(y)
## options for process model
alpha = 0 ## assume no spatial flux
#alpha = 0.05 ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum)) ## random walk with flux
## options for process error covariance
Q = tau_proc ## full covariance matrix
#Q = diag(diag(Q)) ## diagonal covariance matrix
## observation error covariance (assumed independent)
R = diag(tau_obs,nstates)
## prior on first step, initialize with long-term mean and covariance
mu0 = apply(Y,1,mean,na.rm=TRUE)
P0 = cov(t(Y),use="pairwise.complete.obs")
## Run Kalman Filter
KF00 = KalmanFilter(M,mu0,P0,Q,R,Y)
After running the Kalman Filter, we can visualize the outputs. The first set of figures below shows the posterior analysis for each state through time. The second set shows the forecast and analysis standard deviations change through time, indicating when there is missing data in green on the bottom of the plot. As you can see the missing data is not synchronous across states, but the mean of the Analysis is influenced by the across-state covariances.
attach(KF00)
nt = length(time)
### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
lines(time,mu.a[i,],col=4)
lines(time,Y[i,])
}
## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
ylab="Std Error",type='l')
lines(time,sqrt(P.f[i,i,1:nt]),col=2)
points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}
Finally, to get a better idea about the dynamics of how the Kalman Filter works we can zoom in to a subset of time for one state and show the Forecast, Analysis, and observed data altogether.
## subset time
time2 <- time[time>as.Date("2015-01-01")]
tsel <- which(time %in% time2)
n = length(time2)*2
## interleave Forecast and Analysis
mu = p = rep(NA,n)
mu[seq(1,n,by=2)] = mu.f[1,tsel]
mu[seq(2,n,by=2)] = mu.a[1,tsel]
p[seq(1,n,by=2)] = 1.96*sqrt(P.f[1,1,tsel])
p[seq(2,n,by=2)] = 1.96*sqrt(P.a[1,1,tsel])
ci = cbind(mu-p,mu+p)
time3 = sort(c(time2,time2+1))
## plot Forecast, Analysis, and data
plot(time3,mu,ylim=range(ci),type='l')
ecoforecastR::ciEnvelope(time3,ci[,1],ci[,2],col="lightBlue")
lines(time3,mu,lwd=2)
points(time,Y[1,])
The assignment is to run the KF under all four combinations of covariance in the process model versus process error and compare the results. In particular you’ll want to pay attention to the missing data at the beginning of the timeseries for some states. You’ll also want to comment on how spatial adjacency affects the confidence in the inferences (some states are more isolated than others) in the four different scenarios. Finally, you’ll want to note that the alpha estimated from the data itself (0.000209), is close to zero and thus our real forecast would be much more like our no-flux run than our high flux run.
## options for process model
alpha = 0 ## assume no spatial flux
#alpha = 0.05 ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum)) ## random walk with flux
## options for process error covariance
#Q = tau_proc ## full covariance matrix
Q = diag(diag(Q)) ## diagonal covariance matrix
## Run Kalman Filter
KF01 = KalmanFilter(M,mu0,P0,Q,R,Y)
attach(KF01)
## The following objects are masked from KF00:
##
## mu.a, mu.f, P.a, P.f
nt = length(time)
### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
lines(time,mu.a[i,],col=4)
lines(time,Y[i,])
}
## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
ylab="Std Error",type='l')
lines(time,sqrt(P.f[i,i,1:nt]),col=2)
points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}
When alpha is set to 0, we are assuming certain correlations are set to zero, rather than using the calculated covariances of process error from the model (Box 13.1 from textbook). So when we run the model here, we are changing the covariance matrix to just the diagonals, but it doesn’t change the model - the covariances were already being ignored due to alpha = 0.
## options for process model
#alpha = 0 ## assume no spatial flux
alpha = 0.05 ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum)) ## random walk with flux
## options for process error covariance
#Q = tau_proc ## full covariance matrix
Q = diag(diag(Q)) ## diagonal covariance matrix
## Run Kalman Filter
KF11 = KalmanFilter(M,mu0,P0,Q,R,Y)
attach(KF11)
## The following objects are masked from KF01:
##
## mu.a, mu.f, P.a, P.f
## The following objects are masked from KF00:
##
## mu.a, mu.f, P.a, P.f
nt = length(time)
### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
lines(time,mu.a[i,],col=4)
lines(time,Y[i,])
}
## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
ylab="Std Error",type='l')
lines(time,sqrt(P.f[i,i,1:nt]),col=2)
points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}
Including spatial flux makes a big difference in our models. Because information is shared based on distance, areas where we have missing data for a state don’t have error that balloons to 1 - rather, the error decreases when a nearby state has data that can constrain our estimates. This is most obvious for Vermont, whose error decreases as soon as data is collected for New Hampshire, an adjacent state. The forecast itself then fluctuates with the seasons as other state forecasts do, whereas previously it was flat during any missing data period.
## options for process model
#alpha = 0 ## assume no spatial flux
alpha = 0.05 ## assume a large spatial flux
M = adj*alpha + diag(1-alpha*apply(adj,1,sum)) ## random walk with flux
## options for process error covariance
Q = tau_proc ## full covariance matrix
#Q = diag(diag(Q)) ## diagonal covariance matrix
## Run Kalman Filter
KF10 = KalmanFilter(M,mu0,P0,Q,R,Y)
attach(KF10)
## The following objects are masked from KF11:
##
## mu.a, mu.f, P.a, P.f
## The following objects are masked from KF01:
##
## mu.a, mu.f, P.a, P.f
## The following objects are masked from KF00:
##
## mu.a, mu.f, P.a, P.f
nt = length(time)
### plot ANALYSIS mean & CI time-series
par(mfrow=c(3,1))
for(i in 1:6){
ci = rbind(mu.a[i,]-1.96*sqrt(P.a[i,i,]),mu.a[i,]+1.96*sqrt(P.a[i,i,]))
plot(time,mu.a[i,],ylim=range(ci,na.rm=TRUE),type='n',main=states[i])
ecoforecastR::ciEnvelope(time,ci[1,],ci[2,],col="lightBlue")
lines(time,mu.a[i,],col=4)
lines(time,Y[i,])
}
## plot ANALYSIS and FORECAST variance time-series
par(mfrow=c(3,1))
for(i in 1:6){
plot(time,sqrt(P.a[i,i,]),ylim=c(0,sqrt(max(c(P.a[i,i,],P.f[i,i,])))),main=states[i],xlab="Time",
ylab="Std Error",type='l')
lines(time,sqrt(P.f[i,i,1:nt]),col=2)
points(time[is.na(Y[i,])],rep(0,nt)[is.na(Y[i,])],pch="*",col=3) ## flag's the zero's
legend("topright",legend=c("Analysis","Forecast","NAs"),col=1:3,lty=c(1,1,NA),pch=c(NA,NA,1),cex=1.4)
}
Including both spatial flux and process error covariance improves our model over either included individually. Including process error covariance is important because these covariances were calculated from the model itself, so they should be propagated into forecasts, since they can improve the error around both missing and present data for each state. We see this when we add the full covariance matrix to the model that previously only had the spatial flux. In Maine, for example, the mean forecast and the confidence intervals both are better approximations of the peaks and valleys from later years once the process error covariance allows for subtle relationships between all other states (rather than only adjacent states). This might be more pronounced if the covariances were higher, but they’re close to zero, so overall we don’t see a huge effect on the standard error estimates.
Observation error (R) should not be affected by a new covariate, unless it’s something that would affect our likelihood of observing infected individuals - i.e. maybe heat drives people to the hospital, so we are more likely to observe the individuals that are sick, whereas in colder temperatures they would stay home and not be observed.
When converting the Kalman Filter to an Extended Kalman Filter, we need to change the way we update the forecast mean. Rather than using the fully nonlinear model to update the forecast variance, we approximate a linear model using a Taylor series approach, calculated from the matrix of all pairs of derivatives for a given timepoint. This adds another computational step, and we also have to incorporate another process variance parameter, which can determine the significance of initial condition error in shaping our forecasts. Though a bit more complex, the EKF is a useful tool because nonlinear models are often much more realistic approximations of natural phenomena.